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Chapter 1 
ISTRODUCTION 

Ten years ago, the practice of releasing test scores to the public 

was not generally accepted. A study by the Educational Research Service, 

conducted during the 1973-74 school year, showed that only 52 percent of 

the school systems enrolling 12,000 or nu>re pupils released standardized 

test scores to the press (ERS, 1974). At about this same nime, a "how 

to" publication by the National School Public Relations Association 

(NPRA, 1976) introduced a chapter on one state's experience in releasing 

test scores with the following admission: 

"Quite candidly, those associated with the Maryland Department of 
Education in 1974 approached the first time release of test results 
in panic." 

The situation in 1983 is quite different. The release of i;est 
scores to the press and the general public is a common practice. Test 
scores are considered a r.tatistic in the public domain similar to 
population estimates and tax rates. In some cases, public libraries even 
include school districts' annual test reports among their general 
reference materials. 

The issue today is not whether or not to release test scores, but 
rather what to release and how to release it. Further, it has been 
increasingly acknowledged that since the audience for test scores has 
different faces with different backgrounds or iterests, the content and 
format of reporting may also need to be varied. 

^1- , 
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The purpose of this documeat is to address issues ia the release of 
test sjcores to a variety of audiences: parents, school board members, 
school staff, the news media, and the gent:ral public. In the chapters 
which follow we will discuss the kinds of information that such reports 
might include and suggest some strategies for presenting them. 

Before turning to the issue of how to report test scores, it is 
important to consider the question of exactly what one is trying to 
communicate. What information is the school district trying to get 
across? On the surface, this question has a simple answer: the purpose 
of reporting test scores is to tell an audience how well students did on 
some type of test. However, there is a second and equally critical 
purpose of reporting test scores: to provide the audience with an 
understanding of what test scores really mean and what they do not mean. 
This is a harder task for all involved. 

In the chapters which follow, we will look more closely at the 
issues involved in reporting test scores: the kinds of information to be 
reported, the reasons for including each, and some ways in which the 
information might be presented. 
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Chapter 2 
OVEaVIEW OF THIS REPOai 



In the next several chapters^ we describe the kinds of information^ 
that a report on test scores might include. Although most districts 
actually have several different reports on testing (reports to the boards 
reports to parents, reports to school-based staff, etc.), we will begin 
with the annual test report, the report issued to the board of education 
and the public, as it is typically the one which is the most formal and 
complete. This is also the report that is most widely read and usually 
forms the basis for the major press coverage that test scores receive. 
In subsequent chapters, we will talk briefly about reports to other 
audiences: parents and school staff. 

Our aim in presenting this information is to provide guidelines or 
recommendations for reporting test data rather than a set of 
prescriptions. Although our discussion will cover a vide range of areas, 
we recognize that not sll are likely to be included in reports by 
individual school districts. Factors such as practical limits on the 
kinds of data which are readily available and the political i^ensitivity 
of the information may well affect what is included. 

Our recommendations are based both on our experiences in reporting 
test results and an informal review or a sample of test reports from 
school districts across the nation (see Appendix A). Although we do not 
claim that these reports are either exemplary or representative of 
current practice, they provided us with valuable insight into how 

-3- 
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different: districts have approached the problem as well as some practical 
examples of how information is communicated. They also offer clear 
evidence that although f.oiae consistent theioes eoterge ^ there is no one way 
of doing things; both content and format differ considerably. 

In the next several chapters, we disci/ss^three areas of information 
which we feel should be included in some way in an annual report on 
testing. These are: 

1) Descriptive Information 

2) Test Results 

3) Interpretive Cautions 

Where possible, examples which we feel ;^re useful from annual test 
reports by school district have been included as illustrations. 
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Chapter 3 

GEHEHAL REPORTS: DESCRIPTIVE IHFORHATIOH 



Three kiads of de$criptive data should be included ia a report: a 



description of the testing program, a description of what the tests 
measure, and a description of the test scored. Although these sound like 
very basic and siwple elements, review of existing reports indicates that 
they are not always included* 

DESCRIPTION OF THE TESTING PROGRAM 

A brief description of the testing program includes the names of the 
tests u$ed, how they were developed and/or normed, when the tests were 
administered, and the grades in which they were administered. The test 
name should include the form and /or levels used, to facilitate 
comparisons with other test results. Information on norming should 
include when the test was normed and whether separate norms are proviiied 
for special subgroups, e.g., large cities or suburban districts. 
Provision of administration dates is also useful, both to help in this 
comparison and to indicate Whether testing occurred at the beginning or 
end of the school year. Exhibit 1 shows how the San Diego City Schools 
describes its testing program in its report for the 1981-82 school year. 
Included are data on the tests used, grades tested, dates administered, 
and conte nt covered . 

Additional information which may be offered includes data on 
exemption criteria and percentage of students tested. These data can be 
very important. The same test score may well be interpreted quite 
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EXHIBIT 1 
DESCRIPTION OF TESTING PROGRAM 
CSAifl DIEGO CITY SCHOOLS) 



rtSTS ADHINISTERCD AND OAICS 

Outing tbt L96l~82 tcbooV ftj^t , tttt^ tnd ottioDtLLy 
• ttndttdiMd tettt wctt tdttiiiLtteted ditttLCtwide to 
•pptoXLttittLy 50|000 Stn Oiego ttudentt in Cttd^t 3^ $, 6| 
7| LL| ttid L2| to obttLQ dttt f&t two tettltig ptogttoi*. 
The ptogtm tte the CalLfotole Aeeeeenent Ptogtem tud the 
OiettLCtvLde Teeting ?togt$m4 The typee of teete end the 
teetLQg ^tiode tot tbe» two tettitig ptogtMt utte •• 
foLLowe: 

Celifotttie A^ecMHteBt Pr&KtMm 

Stttvey of Beeic skille: Ctede 3 •dainitteted in Ltte 
AptiL tod ettly Hay 1982| coveting content eteee of 
R«eding| tftitteo Lmguegei end Hath^utice^ 

Sutvey of Seeic Skille: Grade 6 edninieteted in Aptil 
L982| coveting content eteei of Keedingi Wtitt<Q 
language I and Hatha«atica. 

Sutvay of Baaic Skillai Ctade 12 adainiateted in 
I>ec<aibet L98L| coveting ateaa of R<adingi Written 
Expteaaiooi Sp«LLing| and Hatbeoatica^ 

The CaLifotnia Aaseaanent Program teat at Ctade \2 liaa 
identical to tbe teat uaed tbe ptevioua aix y^ata^ The new 
tbitd gtad« teat via adainiateted fot tb« tbitd tiae tbia 
apting^ Ptcvioualyi Ctade 3 pupila tnte teated only in the 
coatant area of Reading* At Ctade 6, a q«w teat vaa 
adalttiatetad tbia apting fnt tbe fitat tiok-* 

Diattictwida Teating Ptogtaa 

COttPtabenaive Taata of Baaic Skilla j Level C| Fota 
adainiateted to Ctade 5 atudenta in AptiL 1982, 
coveting cutticuLua ateaa of Keadingi languagCi and 
Hathaaatica (tepotted octobat 12, 1982^ Repott 330). 

C^gtPtebe naive Taata of Baaic Skilla j Laval H, Fota U, 
adainiateted to Ctade 7 atudenta in April 1982^ 
coveting cutticulua ttett of Readingi languagai and 
Hathaaatica (tepotted Octobet 12. 1982^ Repott 330)^ 

Coaptahanaive Teata of Baaic Skilla j Level *i Fota S, 
adainiateted to Ctade 11 atudenta in Noveabat 1981, 
coveting cutticulua tftt of Readingi language i and - 
Hathaaatica (tepotted tbia apting^ Repott 305)4 

The teata adainiateted fot Oiattictwide Teating Ptogtaaa at 
the eleaaotaty and Junlot bigh achool Uvela vete cbanged 
to diffetant grade levela in tacant yeata to teduce Kht 
aoiount of inattuctional tioa conauaed by teating^ .UaO| 
the diattict ptogtaa changed ftoa CTSSi fota S to CTS$, 
Fota tJ tbia apting* Mote dataila aay be found Mi Repott 
330. 



This exhibit illustrates one approach to describing ^ 
testing program. It includes information on the tests 
used^ the grades testing^ dates of test adodnistrationi and 
the content areas covered. 



differently where 40 percent of thQ atudeats hv;e been tested aa opposed 
to 95 percent. It may be especially important to include information on 
who is exempted and the percentage of atudenta actually teated where a 
district or achooX contains significant numbers of special education 
students or students oJ: limited English proficiency. Exhibit 2 shows one 
format for reporting exemption data which is used by the Dallas 
Independeiit School District. Data are presented by both race and 
exemption criteria. 

DESCRIPTION OF TEST CO«TE«T 

This section should include descriptions of the specific skills 
measured^'-by each subtest and how the skills are measured (i.e., item 
format). A discussion of the skills that are neasured is needed because 
subtest names frequently reflect the favorite jargon of a particular test 
publisher and convey little meaning to someone not thoroughly familiar 
with the specific test battery. Sorietimes the subtest name uses highly 
technical terms and requires formal understanding of an area^ such as the 
subtest name ^ Structural Analysis (used on the California Achievement 
Tests). Other times, the name may cover so many skills that the Sf>ecific 
ones being measured need to be stated. An example of this is Mathematics 
Concepts (also used on the California Achievement Tests). 

Exhibit 3 shows how the Washington^ D.C., Public Schools describes 
what is included la the Comprehensive Tests of Basic Skills in their 19S2 
report on test scores. This report provides, in addition to a 
description of the test and subtest content, information on the number of 
items included in each subtest at each grade. An alternative approach 
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EXHIBIT 2 

PRESENTATION OF EXEMPTION DATA 
(DALLAS INDEPENDENT SCHOOL DISTRICT) 



Sunmry of fioifitlon* ttm CcnipCMnu of Syttan^ido Testing rro^rMi 
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This exhibit shows one method of reporting the number 
of students exempted from testing. For each of the 
tests administered^ data are presented on the numbers 
of students exempted by exemption. category as well as 
the racial/ethnic and grade^level characteristics of 
the students exempted. 
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EXHIBIT 3 
TEST CONTENT DESCRIPTION 
(DISTRICT OF COLOMBIA PUBLIC SCHOOLS) 



To tal Readiag scores are obtained by combiaiag the 
Vocabuzary aad Coovpreasioa scores. The Readiag Vocabulary 
subtest measures stuueat skill in determiaiag word meaaiag 
from the context ia which a word appears ia a phrase. 
Reading Compreheasioa items require the student to read 
passages, letters, poems aad articles, aad thea to aaswer 
questioas requiriag literal recall, ideatificatioa of maia 
idea, critical comprehsioa, ability to draw coaclusioas aad 
other readiag skills. 

Total Mathematics scores are obtaiaed by comoiaiag the 
Computatioa , Coacepts aad Applicatioa scores. The 
Mathematics Computatioa subtest coataias items requiriag 
additioa, subtractioa, raultiplicatioa aad divisioa of whole 
aumbers, fractioas, decimals aad algebraic expressioas. 
Mathematics Coacepts measures the studeat's ability to 
coaverr coacepts expressed ia oae aumerical, verbal or 
graphic form to aaother form aad to comprehead aumerical 
coacepts aad their iaterrelatioaships. Fiaally, 
Mathematics Applicatioa items measure the ability to carry 
out problem solviag operatioas. 

Totiil Laaguage scores are obtaiaed by combiaiag the 
Laaguage Mechaaics, Laliguage Expressioa aad Spelliag 
scores. The Laaguage Mechaaics subtest measures studeat 
skill ia capitalizatioa aad puactuatioa. The Laaguage 
Expressioa subtest measures correctaess aad ef f ectiveaess 
of laaguage usage, dictioa, ecoaomy aad clarity of 
expressioa, aad ckill ia orgaaizatioa. The Spelliag test 
neasures the ability to recogaize spelliag errors. 

The Refereace Skills test assesses knowledge of the 
uses of a library, parts of books aad staadard refereace 
works. The Scieace items are related to the various 
coateat areas of the physical aad life scieace. 
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Exhibit 3, continued 
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* Separate scores are reported for Concepts and for 
Applications * 



This exhibit provides an example of how one district preseats 
ttetailed information on exactly what its testing program assesses* The 
text provides a brief description of what the subtests measure and the 
table shows how much attention is devoted to each of the general areas* 
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taken by th^ San Diego City Schools is presented in E:thibit 4. The use 
of pie graphs to display this information is somewhat unusual> but 
clearly communicative. 

DESCRIPTION OF TEST SCORES 

The final area is that of descripcioo of test scores. This is where 
the metric being used to report test data is presented and should be 
defin'^.d. The contents of this type of discussion will clearly vary 
depending upon the actual scores being used, e.g., percentile ran!cs , 
grade equivalents> etc.^ and the kinds of test being considered^ e*g*i 
criterion-referenced vs. norm-referenced tests. The critical factor is 
the presentation of some definition aft:er the netric is introduced. 
Exhibit 5 presents an exo!rpt from the 1981-82 test report of the Houston 
Independent School District, in which definitions are provided for two 
dicferent n^tcics used in reporting their test scores: grade equivalent 
scores and percent mastering each objective. These descriptions not only 
provide a clearly understood definition for each of the term^ but also 
suggest possible pitfalls in their interpretation. This important area 
will be discussed in greater detail in a later chapter. Appendix B 
presents definitions for some commonly used test terms with cautions 
concerning their usag^. Several eleioentar^ testing textbooks also have 
considerable discussion of these terms. See Appendix C for a lisc of 
these books. 
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EXHIBIT 4 

PIE GRAPH SHOWING CONTENT DISTRIBUTION 
(SAN DIEGO CITY SCHOOLS) 



READING 



WRITTEN LANGUAGE 




This exhibit Illustrates an alternative way of 
describing what Is assessed by a testing program using 
pie charts rather than text and tables* 
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Cb«pt€r 4 
GEHERAL REPORTS: TEST RESULTS 



Annual test reports generally include two types of ddta: overall 
district results and results for individual schools* These are usually 
presented in a very similar fashion, using the same descriptors and 
addressing the same basic questions. 

DISTRICT RESULTS 

Annual reports on districtwide results commonly present two types of 
information: information on how well the typical or average student 
performs^ and information on how performance differs among students. In 
addition, annual test results may be supplemented by historical data 
which assist in the interpretation of performance in any single year. 

The particular metric method used for displaying average performance 
will vary depending on the type of test. In reporting data on 
norm-reterenced standardized tests^ average scores, reported in terms of 
stanines , percentile ranks ^ or grade equivalents, are generally 
presented. Sometimes districts also report the percentage of students 
scoring above some reference point, typically the national mean or 
median. In reporting scores on criterion-^ref erenced tests, the results 
are usually presented in terms of percentage passing. While tables are 
frequently used to present these data^ graphical displays are especially 
useful. 

Information on how performance differs among students can be 
communicated liy presenting a frequency distribution. One way to 




accomplish this would be to report the percentage of students falling 
into each quarter of the national no^^s • This is the way in which th^^ 
Albuquerque Public Schools presents such information (Exhibit 6). 
Another approach is to present the data using stanines which show the 
spread of scores In a little more detail than national quarters. Exhibit 
7 shows how information on spread of scores was presented by the 
Montgomery County Public School system in their 1981-82 Annual Test 
Report. This exhibit not only provides information for the county but 
also included comparative data from the national norm sample. 

Another way to look at performance differences among students is to 
present test scores by socioeconomic status (SES)^ by the major 
racial/ethnic groups, and/or by gander. Although it is recognized that 
reporting such information can be politically sensitive, these data can 
be useful in identifying areas where special efforts may be needed. The 
formats for reporting described in the previous paragraph are equally 
applicable here. We would like to stress, however, that reporting score 
distributions may be especially important. Average scores may give the 
impression that students from different groups perform very differently* 
Although this may be true on the average, it is also important to show 
that most groups have some members with high scores and some with low 
scores, no matter how high or low their average scores are. 

One caution in grouping students by S£S must be nentioned here. SES 
information can be useful in helping audiences to understand test 
results, since standardiz\>d test scores have repeatedly been shown to be 
highly related to SES variables such as parental income, parental 
education, and parental occupation. However, while SES data provide a 
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EXHIBIT e 

■ REPORT OF STUDEtrrS SCORING tN 
EiCH NATIONAL QUARTER 

(ALBUQUERQUE PUBLIC SCHOOLS) 
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Thid exhibit shows how data on the nuraber and percent 
of dtudentd scoring In each quarter of the national 
norm group can be used to report the distribution of 
test scores* 
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EXHiBtT 7 

GRAPHIC PRESENTATION OF DISTRICT 
AND NATIONAL STANINE DISTRIBUTION 
{MONTGOMERY COUNTY PUBLIC SCHOOLS) 



CAUPOItmA ACHIfiWMBNT TESTS, PALL mi 
DtOTRTBtmOK or STANINE SCORES ON 
TH£ TOTAL BATTIEEY FOR ALL CRADES TESTED 



H MCPS 




Thli exhibit tbowt M alternsclvc way eo report score 
dlicrlbuclon* utlng a graphic^ ai opposed eo e tabular^ 
dlaplay* Ualng overlaya^ ehla exhibit alao shovs hov 
data on local icora dlicrlbuclona can be coapared eo 
thoie In the national norv aample* 




partial explanation £or some antecendents of low tesc performance, such 
data must not be used to justify continued lack of academic success. In 
other words, such data should not be used to explain away the problem of 
low test performance nor relieve the school of the responsibility for 
trying to increase learning. 

Our discussion so far has focused on reporting test scores for a 
single year. It can be useful to put such annual test results in a 
historical perspective to judge whether achievement is improving or 
declining. The historical data can be presented in one of two 
ways — cross-sectionally or longitudinally. Cross^sectional data show the 
results for each grade tested each year. All students tested in each 
grade are included* These results simply show if the scores for each 
grade in a given year were higher or lower* Exhibits 8 and 9 show two 
alternative ways o£ presenting such data: bar graphs used by the Los 
Angeles Unified School District (1981-82 test report) and line graphs 
uaed by the San Diego City Schools (1981-82 test report). The latter 
also compares city results to those of the state. 

Since cross-sectional displays provide data for different students 
each year, any trends could be caused by changing student ability, not 
quality of instruction. To eliminate the possible changes in ability 
levels longitudinal data are needed. Longitudinal data show the trend of 
scores across two or more years for students tested in all years. This 
could show not only whether achievement was improving or declining but. , 
aldo provide some indication of the quality of instruction* However^ to 
be able to do this^ it is necessary to control for differences in the 
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EXHIBIT 8 

REPORTING CROSS -SECTIO^JAL DATA USING 
A BAR GRAPH (LOS ANGELES UNIFIED SC'IOOL DISTRICT) 




This exhibit shows one method of reporting historical, 
cross-sectional data using bar graphs to illustrate 
changes in performance over time. 
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EXHIBIT 9 

REPORTIMC CROSS-SECTIONAL DATA USINO 
A LINE GRAPH ( SAK DIEGO CITY SCHOOLS ) 




RC4dfn9 wrtttert L«rtgutgt (Uthcm^tlct 

FIGURE I 

GMPHtC OlSPLAY OT GMDE 3 SCAIE SCOftCS 
on trie t • • St*U ■ • 



This exhibit shot/s an alternative way to present 
hisccrical cross- sectional data* In addition^ in this 
exhibit^ statewide data have been added to provide a 
reference group for the local data of interest. 
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tests at each grade level* This vill be discusf.ed in detail in a later 
chapter on how to use test data. 

SCtlOOL RESULTS 

Average scores, percentage passing, and score distributions may be 
presented for each school in a district suouiary report in a fashion 
similar to that used in presenting districtvide data. It might be best 
to limit the distribution here to number and/or percentage in each 
national quarter to minimize the data that the reader has to deal with. 
A slightly different way to present the data is to show the scores for 
the student at each quartile in the school. Eschibit 10 shows how the 
Montgomery County Public Schools presented thid information. School 
staffs may be sent more detailed frequency distributions^ as well as data 
such as performance by objective, in a separate memo. Reports to school 
staffs will be discussed in more detail in a later chapter. 

Results by race and sex for each school are also usefu' but should 
only be presented if the groups are large enough to provide good data. 
Since mean scores for small groups can be affected by a few extrenoe 
scores, reporting results by race or sex for small groups can lead to 
misinterpretation. 

Historical data can also be useful for schools. In the case of 
schools, it is even more important to use longitudinal data than for the 
dUtrict because factors — such as SES and ability — that distort 
cross-sectional data have an Aven greater impact at the school level than 
at the district level. Once these factors are eliminated, it is much 
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EXHIBIT 10 

USING BAR GRAPHS TO SHOW TEST SCORE SPREAD IN SCHOOLS 
(MONTGOMERY COUNTY PUBLIC SCKOOLS) 



NATIONAL PEflCENTtLE fUNK FOfl THE ci^TUDETNT SCOfftNO AT EACH SCHOOL'S 
FlftST QUAftTILfi (01X M^lAN. ArJP THIflD OUAhTILE (03) * 
CAUFOflNU ACHIEVEMENT TESTS G^OC TOTAL eATTEIIY, 19ft1'82 
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This exhibit $;hovs how bar graphs can be used to 
provide inforoation on the average score and 
distribution of scores for individual schools* 
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more likely Cbac art increase or decrease in performance is related to the 
school program. 

SCS data for a school can be very helpful in evaluating its test 
results. Once again, it should be noted that S£S factors should be used 
to help understand test results^ not to justify low performance. 
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Chapter 5 

GENERAL RESULTS: INTEftPREXIVE CAUTIONS 



One of the most difficult aad frustrating aspects of reporting test 
results is to assure that the results are interpreted and used as 
accurately as possible. This is not an easy matter^ as most people think 
they know what a test score loeans although very few people really do. An 
example of the confusion which too often surfaces can be illustrated 
using grade equivalent scores. What does a grade equivalent score of 7.2 
nean? It means that a student is working on the level of a 
seventh-grader in his or her second month of school. Right? Wrong. But 
this interpretation sounds right and is far too commonly heard. In fact, 
some of tha most dangerous and cosnion misinterpretations occur where what 
sounds right or what makes good common sense is technically wrong. 
Unfortunately, these misinterpretations are very difficult to reverse. 

In reporting test scores to the public it is, therefore, critical to 
provide cautions concerning how the data should and should not be 
interpreted. Exactly iHiat these cautions are depends on the particular 
data being reported and the kinds of tests being used. Listed below are 
some suggestions for inclusion, gleaned from areas where 
misinterpretation has been noted to occur frequently. Considered are 
problems in 



comparing scores across test batteries 



comparing data across grade levels 



interpreting normative data 
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• comparing the performance of different groups of 



students 



• interpreting small changes in test performance 



Appendix B provides information on cautions to be observed in using 
various types of te$t scores. In this section, we present some 
additional cautions which should should be kept in mind when interpreting 
data. 

There are problems in comparing scores across test batteries. 
People frequently want to compare scores across ^^chooI districts where 
districts do not use the same tests. Such comparisons are based on the 
mistaken belief that most tests n^asure the same things achievement, and 
that a test called reading comprehension on one battery is approximately 
equivalent to a test called reading comprehension on another battery. 
This can lead to some erroneous conclusions. There are several reasons 
for this caution. 

Firrt, in norm~referenced tests (nRT) , norms for each test are based 
on a different group of students who Qtay themselves differ in ability. 
Although test developers attempt to obtain a nationally representative 
sample for their norming groups^ actually obtaining such a sample has 
become increasingly difficult as more aud morn tlistricts h^iva refused to 
participate in such endeavors. We simply do not know how the norming 
groups for different tests compare or whether certain tests set a higher 
standard of performance than others. For criterion-referenced tests 
(CRT) ^ the standard setting methods for different tests may create 
problems analogous to those created by the development of norms for 
norm-referenced tests . 
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Second » achievemeac tests differ ia cheir coaceac » despite the face 
chac Che aames of cescs or subtests may souad the same « Further » item 
formats may differ evea where the same objective is beiag assessed. 
Depc::ding on the type aad extent of differences » the actual similarity of 
what is tested may » therefore » vary widely. E^chibit 11 describes how 
different item formats are used for tests of the same or similar names on 
the California Achievement Tests and the Iowa Tests of Basic Skills. 

Caution must be used in comparing results for different grade 
levels, even where the same test battery is being used . The problems 
described above in comparing scores across different achievement tests 
are also founds although in slightly reduced form» in comparing scores 
across different levels of the same test. Again » the norms are based on 
different groups of students and we do not know for sure that the norm 
group for one grade level was in fact similar in critical areas to the 
norm group for another. While common sense suggests thaL* i.a a 
'^nationally representative group** cohort differences balance out> we 
simply cannot say with certainty that this is true. This issue becomes 
especially troublesome where trends in performance across grade levels 
are used to make some sort of judgment about the quality of instruction* 

In attempting to make comparisons across grade levels » one must also 
be concerned with a s<^cond problem: the comparability of match between 
test content and curriculum content at the grade levels examined. 
Because tests are designed to reflect a consens^i^ regarding what might be 
considered a national curriculum^ they naturally do not reflect all local 
curricula equally well. If this match varies across grades » performance 
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EXSIBIT 11 

COHPARISO** OF ITEM FORMAT ON TWO 
STANDARDIZED ACHIEVEMENT tESTS 



Spelling (ITBS)/Spelling (CAT)"The PTBS asks Che student 
to find an incorrectly spelled word in a list of words. 
The CAT asks the student to find an incorrectly spelled 
yord in a sentence. Neither test asks the student to 
actually spell yords and could not within the constraints 
of the opitical scan format employed. 

Vocabulary CrrBS)/Reading Vocabulary CcAT)--The ITBS asks 
the student to find words that mean the sam as a given 
yord* The CAT contains some questions asking for the satne 
meaning and some asking for the opposite waning. It also 
has a few questions involving words with multimeanings. In 
these questions^ s definition is provided and the student 
has to find the sentence in which the word is used with 
that definition. 



This exhibit describes how two different achievement tests 
approach the loeasareaent of the same skilly Spelling. This 
illustrates the point that one cannot assume that two 
subtests measure exactly the same skill simply because the 
tests used the same skill name. 
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differeaced unrelated to the quality of iastructioa are likely to be 
fouad * 

Teiid iddue id very importaat ia light of the aot infrequent fiadiag 
that test scores appear to decline as one progresses through the school 
years « Such a finding is typically interpreted a^! indicating that 
students do more poorly the longer they have been in school. An 
alternative hypothesis is that there is greater fidelity between test 
content and curriculum content in the early grades than the later grades « 
In the later grades > course content becomes more variable and a good 
match is harder to find^ 

Standardized test norms relate a student's scores to those of a norm 
group which took the test in the past vhen the test vas standardized^ not 
to the current group of students being tested . When people see test 
scores for a given year and percentile ranks shoving hov students 
performed relative to a national sample, there is a tendency to assume 
that the two groups took the test at approximately the same time. This 
is not the case even for the most recent edition of tests. There is 
generally one norming sample used for each edition of a test> 
and~depending on when a test was normed-^-^that sample may have taken the 
test one to seven or more years ago. 

Comparison of results for different groups of students can lead to 
incorrect^ sometimes harmful^ conclusions . In the discussion above , we 
have considered hov some aspects of the tests themselves can influence 
the performance of students and thus complicate the interpretation of 
results. However, even when tvo groups of students exposed to similar 
programs take the saiae test vith the same norms or passing standard, it 
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is neceaaary to coasider factors other thaa the acorea themaelves in 
iaterpretiag the fiadiaga aad drawing coaclusioaa about factors auch aa 
the quality of iaatructioa. The major factors to coasider r^l^r^ the 
aocioecoaomic atalas (S£S) of atudeata^ indicated by variablea auch aa 
iacome^ parental education^ aad parental occupation. All of these have 
beea ahowa to be highly related to standardized teat performance^ with 
higher SES students tending to show higher test performance (other things 
being equal) . 

Data on SES are not always available^ either because the school 
diatrict does not have access to the infiihaation or because the school 
district feels that the data are too difficult or sensitive to collect. 
Thus, it is not always possible to partial out an SES effect. If this is 
the case ^ it might nonetheless be useful to point out the importance of 
the relationship as a partial explanatory variable. This may be 
especially important where other factors are likely to be confounded with 
SES. An example of such confounding occurs where data are reported by 
racial/ethnic group or by school. 

could find no district that reports results by SES groups. 
However, several did include some SES information in their reports. One 
of the most thorough exaiq>les of this kind of reporting is shown in 
Exhibit 12. This is taken from the Dade County District and School 
Profiles, 1982-83. 

Small test score differences shoula not be used to make educational 
decisions . All test scores contain measurement error. This can be 
caused by many things including ambiguous questions, how the student 
feels when he/she takes the test, lucky guesses, or distractions 
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EXHIBIT 12 

REPORTING SCHOOL SBS DATA AND 
STAFF CHARACTERISTICS (DADE COUHTY SCHOOLS) 
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This exhibit shovs one way of Including some data on 
socioeconomic status (S£S) In a report on te££t scores* 
The amount of data presented on S£S Is quite limited^ 
however^ as only data on the percentage of students 
with free/ reduced lunch are Included* 
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occurriag while the test is beiag administered. For these reasons, one 
must be very cautious in interpreting small differences as indicating 
meaningful differences in instructional quality or knowledge of skills. 
This is especially true for individual student results, since the error 
in scores for individual students tends to be much larger than that for 
group data* 

The problem is, however, equally important where larger groups of 
students are concerned. Small differences in scores for large groups of 
students may appear important because statistical te^ts indicate that 
they are significant and, thus, unlikely to be caused simply by error. 
Although this id true, in interpreting surh findings, one must also 
consider whether or not the difference is really meaningful^ i.e.^ does a 
difference of one percentile point, although statistically significant if 
enough students are involved, merit major panic or euphoria on the part 
of a school system? Perhaps a good test of importance can be made by 
assessing how much money or how much change a school system would be 
willing to spend or make to cause so small a change to occur. 
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Chapter 6 
REPORTS TO PAREHTS 



In the previous chapters^ we discussed issues to consider in 
presenting an annual test report. In this chapter, we turn to a second 
audience: parents. Reports to parents are ^ in many respects^ quite 
similar to reports to boards of education and the general public. 
Despite their focus on an individual student rather than a group, they 
still should provide a description of the program, test results, 
assistance, and cautions in interpretation. Typically, however, reports 
to parents are presented quite differently^ under the assumption that 
parents, as laymen, must be given the information in a form which is both 
briefer and easier to comprehend. The question and answer format is 
popular (see Exhibit 13 taken from materials used by the Dallas 
Independent School District) as are brochures; slide/tape presentations 
are often used to provide an overview, and graphs and other pictorial 
displays arc frequently found. The trick here is to make the description 
brief and easy to understand without^ at the same time^ appearing to 
insult the intelligence of the audience. 

The most difficult and most important part of the report to parents 
is presenting the information on how their child performed. In reporting 
actual scores^ it is important to choose a metric which is relatively 
easy to understand^ and which can be readily defined. In reporting 
scores from norm~referenced tests, stanines are a popular choice because 
they appear on the surface to ineet the criterioa of ready 



-32- 




EXHIBIT 13 

QUESTIONS Am ANSWER FOmX FOR PROVIDING PARENTS 
WITH INFORMATION ABOUT A TEST PROGRAM ( DALLAS 
INDEPENDEHT SCHOOL DISTRICT ) 



TEXAS ASSESSIBIT OF BASIC SKILLS - TABS 

<STATe*NAHDATED) 



KHEK? FEBRUARY 

EACH SCHOOL ESTABLISHES SPECIFIC OATES 
UtlHXH THE GIVEN PERlOO 

W1 amSl EDUCATIONAL HEEDS OF TEXAS 

EBflOOIE PLAHS FOR M£ETtHG HEEDS 
PVALUAXe ACHtEVEHEHT 

HHO? ALL STUOEHTS tH fiftADPS m OTHER 

* STUKHTS IN flaADESiO, V fM \2 

UHAT? HtHlHUn SKILLS MEASURE OF EEA&IHfii JiRlUMi 

SCORES? * RESULTS REPORTED TO STUDEHTi PARENT OR 
GUARDtAHi AND SCHOOL PERSONNet 
^PROCESSED BY TEXAS OUCATtOH AGENCY (AUSTtN>. 
RETURNED MAY 



This illustrates one method for providing parents with 
a description of a testing program* It is well suited 
for the purpose of communicating with parents because 
the question-^and-ansver format provides clear and quick 
answers to frequently asked questions* 
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conpreheasibility. As long as tiO oae actually asks for a ^jefiaitioa, 
staaiaed may be a safe choice. However, maay a test director has 
squirmed his/her way through several uncomfortable miautes after a PTA 
mamber has innocently asked, "What exactly atfi stanines?" 

An alternative which may better serve communication are national 
percentile ranks. Since they have a range of 1 to 99, they fall on a 
scale which seems both familiar and easy to use. Although they may 
appear to convey greater precision than is justifiable, this fault is not 
unique to national percentile ranks. In fact, regardless of the nsthod 
used for reporting individtjj)! scores, a relatively strong statement 
should be included regardiog the error in test scores and their 
limitations . 

Perhaps the single xaost critical thing in reporting to parents is 
conveyin^^ the mes^^age that test scores are far from perfect indicators of 
what a student has or has not learned. Materials accompanying such 
report:? shoiild, therefore, be quite clear about the raultix>licity of 
factors that test scores may reflect. The fact that test scores 
typically contain a good deal of imprecision cannot be overstated. 
Unfortunately, the notion that an achievement test provides a precise 
n^asure of learning is all too widely held. A good way to get across the 
idea of test error on norm-referenced tests is to report scores using 
score bands as shown in Exhibit 14. This is part of the report of 
individual results used by the Pittsburgh public Schools. This format 
for presentation reinforces the concept that a test scora is not really a 
single point score, but an approximation. Such a display also helps to 
curtail concern over a change in performance of one or two points. 
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EXHIBIT 14 

USE OF ERROR BANDS AND TEXT TO 
REPORT INDIVIDUAL STUDENT RESULTS TO PARENTS 
(PirrSBURGH PUBLIC SCHOOLS) 



iCr^UttlC VOCABULUY 

sriai»c 

LArx;UIRe KEClJAKtCS 

UMitfAce Exritcstioii 

HATii ^OticCm 4 Alft, 
TOTAL HATU 
TOTAL BATTCtr 

icrtneKCs sicilu 
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S^I^Hi Jit ftrt 



««si»MAiY or rurtL's 8C(mts«« 

Tins SrUDDtT'S ACUteVEHEXT tH 
iAStC SlCtLU HAT tWST » 
SUmMtUKO lY LOOKItlC AY Till 
TOTAL SCt)iE3. IT CAM BK SIBH 
YItAT lint TOTAL SCOttS Atl SIClTtt 
T1IAH APriOHHATtLY 30 reXCCMT Of 
Tlie HATtOM'S 4TII CiAOm tH 
tMG. 50 ritctifr tH UlCUilCl« 
S3 n§Cm IH HATUMATtCft, AMO 
50 tllCUr tK TOTAL UTtttt^ 



nil ttAS STHflNmtS tH COMI!CaY CAPlTAf.tltHC Tll£ rttOMOUH t, PROTCA KOVtlS. 

AND rtora Aajicrivts, tH siotcrtNC mc comcT rKOHomi aho Aatdrrtve 

rot A WRtHCI, Am 1H OtrAIHtHO tHFOHHATtCH FMCH Ttil tHOCX OF A AOOK. 

Silt HAS WAKMISJIS tH SrUJ.tHC AIX Till S0UH03 tH A UOftD, tH COtUttCTLY 
USttia COHHAS AND QUCTATtCH HAKKS, IH SOLVIfIC CtiHMrTATtCH TiOOLEHS tH 
OtVtStOK. AMD tH UMOtAStAKOtHC WMBCi SCMTISK^. 

SHE KAY MIS TO llVtlW OtTAJHtMG tHrOWXTtOM mOH A OtCTtONARY Wt. 



1 hJv«f*vi«woU thii rapdiilAn4ti4vonted» 



0£AnPAB£NT; 

Tnift l» » rtttoti o4 youf clilkTt iA»t ro«iJlU in Ihe lutthi: sfciilt 
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ChifiJ i» compufotl Id ^trier «liHfDrit> i«i lh« samn ^ratto wIm> loott 
Ih4» lesl> thtottgliout ih« tuitrc ^ 

tiM »iibidt;l» tMlud wo ^ <| on Um) t»ri sido of liio civVl Ti»« 
ftWftftfiltlft tff wo» a wt I to PMCO ntfttw^ nl >ludttni»in IIm iWllimi wItO 
fcormi Nitrtw yout ciiiM on mcti lo«i. (Suo ruicit o' piirro im 
4«plaoMtlOA ul OUtW «CUiO« ) 

On iho i^ihi lido o' ItHt dijiilf 111* rnw» olX » «iiow nuw wvil 
your ciilU M i>n Iho lo>t» a» comr>Ar4tl lo oihvr »liiilttiil» 
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This illustrates sti effective way of providing parents 
with a report of individual student pro^^ress* Of 
special importance Is the use of test score bands which 
readily illustrate the error that is part of test 
scores* This report format is reprinted by permission 
of the publisher^ CTB/McCraw-Hill> Inc*> 2500 Garden 
Road, Monterey, CA 33940. Copyright © 1977, 1970, by 
Mc Craw-Hlll. All Rights Rftservea. Printed in the 
U*S.A* 
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The Pittsburgh report also shows how parents caa be helped with a 
few liaes of text highlightiag iadividual strengths aad weaknesses. Here 
the iaformatioa OQ subtest performaace has been supplemented by 
iaformatioa oa particular objectives ia order to make the data more 
meaaiagful. This is useful as loag as the Uest includes a sufficient 
number of items per objective and possible varying difficulty level of 
items and degree of objective coverage have been taken into account, 
without these controls, such data on objectives may be misleading. 

Finally, in reporting to parents, it is critical to keep the 
particular needs of this varied audience in mind. Presenting student 
results in siQq>le English, free of jargon^ may not be enough. More and 
nioi'e school districts are providing reports in languages other than 
English where substantial numbers of parents are likely to have limited 
English skills. An example from the Dade County Schools is shown In 
Exhibit 15. While it is debatable whether or not non-English 
alternatives should also be provided for reports such as the annual 
reports of districtwide results^ it is far clearer that reporting in 
other languages is important where individual students are concerned. 

It should be pointed out that not all school districts choose to 
send a written report on test scores to parents. As an alternative, 
scores are often conveyed verbally in some form of parent -teacher 
conference. Thi^s approach has the advantage of providing the opportunity 
for discussion between the parctnt and teacher and the chance for specific 
questions to be raised and addressed. Unfortunately^ because not all 
teachers understand test scores equally well, it also sets the scene for 
some widespre? 1 miscommanication which may go uncorrected and undetected. 
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EXHIBIT 15 

REPORTING RESULTS TO PARENTS IN TWO 
LANGUAGES (DADE COUNTY SCHOOLS) 



ST ANFOIVO AOMVIMfNT TC5TS 
fMOCCOUNTVSCHOOLlI* 




Sir 



WT" 



1« 



r-—^ *-iT" 




Thl3 exhibit Illustrates one approach to communicating 
test scores to parents with limited or no English 
skills* The critical Information on student perform- 
ance Is printed In two languages* English and Spanish* 
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It may be that some combindtion o£ a formal^ consistent, written 
communication anC a: personal conference with ch*i teacher regarding the 
specified of the classroom situation provides the safest and best way 
reporting test score information to parents* 
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Chapter 7 
REPORTS TO STAFF 



The final audience to be considered here is school-based staff. 
This audience is composed of people serving several different functions: 
teachers, counselors, principals, and other specialists. Each has a 
slightly different use for test score data and each wants data presented 
in a slightly different form. While the approaches we have already 
discussed — the annual test report and the report to parents, the 
brochures and slide/tape presentations — partially noeet these needs, they 
are not sufficient* Other data and other formats are better suited where 
staff use test data for program assessnent and instructional decision 
making. 

Typically, districts provide this additional information through 
school level reports or printoutii wliich aro intended primarily for use by 
staff of a particular school and are not commonly shared with other 
schools or the public. These are data displays rather than coiiq>lete 
reports; they assuns a fairly knowledgeable audience and frequently have 
little text or accompanying explanatory materials. 

The exact contents and number of such reports, again, vary. One 
district sends out as may as twenty different reports on testing to 
schools annually. Others $et by with far fewer. Information needs 
appear to be dictated not only by accepted conventions, but also by the 
specific concerns of a system in a given year. The list below provides 
an idea of the variety of kinds of reports that; may be sent to schoc.l^: 
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lQ<jivi<ju8l Stu<jent Reports * These reports are similar to those 
provi<ie<i to parents* They list the total subtest scoch^ foi 
each stu<ient* There are often several copies of these so thst 
teachers an<i counselors can each have a copy* 

Performnce by In<jivi<jual Classroom * These reports list the scores 
for stu<ients aggregate<i to the clasgtoom level* 

Frequency Distributions * These reports provi<ie a <ietaile<i 
<iescription of the sprca<i of scores by inclu<iing the number of 
stu<ients achieving each possible score* 

List of High an<j Lov Performing Stu<jents * These reports supplement 
the frequency <iistributions by in<iicsting which students have 
exceptional scores* This report might be of use as part of the 
selection proce<iure for special progrsms or for grouping stu<ients 
for instruction* 

Performance by Objective * These reports show how well some aspects 
of the curriculum are being mastere<i* When using these results, the 
number of items for assessing each objective, the <iifficulty of 
those items, and the extent to which the items luasure the stated 
objective must be considered* 

Performance Across Years * These reports provide historical 
sun&aries^ either cross-sectioaiil or^ longitud inal » of achievement 
over time* The information they provide can be useful for 
determining changes in school and student performance* 

Performance by Feeder School * These reports show how well students 
from different feeder schools performed and provide information for 
the receiving school to use in planning the instructional program 
for these students* 

Performance by Special Program * Part of the evaluation of special 
programs (e*g*, Chapter 1» ESL, etc*) is to look at the test results 
of students in those programs* 

Frequently » these data are presented or at least reviewed in a 
workshop-type setting using a variety of materials* This approach seems 
favored over attempting to include all information in a self-contained 
document* as is the case with reports to boards of education or parents* 
These workshops serve a dual purpose* They permit testing personnel both 
to communicate the information and to assure that the information is 
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being interpreted correctly. In addition, they allow school and staff to 
pose questions to the testers, which can lead to new analyses and/or 
better use of che information. Ideally, all audiences should have the 
opportunity to discuss test scores and receive assistance in their 
interpretation. However, it is especially critical that this occur with 
school staff. It is school staff that actually make decisions regarding 
individual students based on test data and it is at the school level that 
the impact of misinterpretation is the greatest. In large school 
districts, nieeting with each school each year to go over school-level 
data may be overly ambitious, and some sort of staggered schedule may be 
more practical. The critical thing is that school staff receive 
sufficient opportunity for discussioa and that reports to this audieace 
evoke interaction as well as comprehension. 
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Cbtpter 8 
SUGGKSTIORS FOR USING TEST DATA 



The previous chapters presented the elements that might be included 
in a report of test results and suggested some alternative strategies for 
predenting information to different audiences* this chapter will offer 
soQie suggestions about how one might go about answering questions 
regarding test scores that are frequently asked by all of these 
audiences* Here we are talking primarily about how the types of data 
described earlier might be used to respond to some of the more common 
questions posed. The emphasis here is on interpreting the data rather 
than simply reporting them. Three commonly asked questions are listed 



I. How do a school's test scores compare with those of other 
schools? 

2* In what areas does the school need to improve? 

3* Did the students in the school do as well as they should have? 



COMPARING SCHOOLS 

Although one might wish it were not the case^ one of the most 
popular uses of the data described earlier is to draw comparisons amoag 
schools^ in the hope of making some assessment of school quality. 
Although using test data for this purpose is fraught with interpretive 
problems^ there are clearly more or less acceptable ways of approaching 
this tasK* Too frequently^ comparisons of schools are made simply by 
looking at test scores and determining which are higher or lower* In the 



below. 
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extreme > this leads to a ranking from top to bottom with the high-scoring 
schools considered "good*' and lower-scoring schools considered "bad/* 

This approach can be extremely midleadipg because it totally ignores 
other factors affecting test performance and attributes all variance to 
the school. Unfortunately, we do not know of any totally satisfactory 
way of using test data to determine which schools are effective and which 
schools are not. However, suggested below are approaches which cl<^arly 
improve on the simple ranking method described above. 

Combining Test Scores and SES Data . Since standardized test scores 
are highly related to SES variables, it is likely that a school or group 
of students with low SES will also have low test scores. Thus, this 
relationship has to be accounted for so that schools with low or 
declining SES are not automatically labeled as instructionally 
inadequate. To avoid this kind of labeling, schools can be grouped 
according to SES. The test scores of schools wi Sin each group can then 
be compared to see how well each school is peiforming. An alternatitre 
approach is to use regression analysis to combine SES variables and 
produce *'predicted*' scores for each school and then to see which schools 
perform substantially above or below this prediction. The critical point 
here is that comparisons are made only among schools with students from 
similar backgrounds. Again, however, one must repeat the caution 
regarding the possible misuse of analyses which incorporate SES. SES 
data can be used to help understand test results, but they should never 
be used to provide a rationalization for tolerating low performance. 

Longitudinal Analysis . Given the strong relationship between test 
scores and SES factors and the potential danger of using low SES to 
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justify low performance, it may be better to deal with the qualitative 
issue in another way. Longitudinal analysis, introduced earlier, can be 
used to overcome the SES/test score relationship by using the same 
students at all data points and by using score trends instead of absolute 
values. However, one must keep in mind the possible problems posed by 
differences in tests and test norms discussed previously. To account tot 
these differenced, a baseline must be established. For example, the 
baseline could be established from the results of all students in a 
district who were tested in the same school in both Grade 2 and Grade 3. 
The trends of such students in each school could then be compared with 
this district baseline. 

Although the longitudinal analysis described above provides a 
straightforward, fairly easy*to-ninderstand way to use test scores to help 
make judgments about school programs, it does not make it possible to 
make the same judgment about the entire district, since it could be 
dift'icult to develop baseline data from a larger group with the same 
curriculum. About the best that can be done at the district level is to 
establish the baseline from the trends from one academic year and then 
compare the trends for all of the following years with that baseline. 

Since longitudinal analysis involves looking at score trends, it 
provides an excellent opportunity to present the results graphically. 
Exhibit 16 shows the presentation of some longitudinal data in a report 
from the Montgomery County Public Schools. 
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EXHIBIT 16 

GRAPHIC DISPLAY OF LONGITUDINAL RESULTS 
(MONTGOMERY COTHT^ P^'^LIC SCHOOLS) 
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This exhibit Illustrates how longitudinal data (data on 
the same students tested at two time points) can be 
used to display trends In test performance over time* 
This Is proposed as an alternative way of presenting 
historical data. 
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WEAK AND STRONG AREAS 

People frequently wish to kaow how well a school is performiag ia 
each academic area aad what specific sCreagChs aad weaknesses exists 
Suggested here is a way of determiaiag strengths and weaknesses by 
comparing per6orinance in each subject area to performance in all other 
subject areas. This method assumes that all subtests are part of the 
same test battery and no cross-battery comparisons are employed « Because 
of the nature of the data from NRTs and CRTs^ the way to use the data 
from each will be a bit different. The approach to MRT data will be 
presented first. It will then be modified to fit CRT data. 

In presenting data to determine weaknesses or strengths, some 
indication of the error in each test score should be considered diong 
with the absolute test scores « The inclusion of test error is iieeded to 
prevent drawing the conclusion that a school i^ weak in math because its 
score in that area i$ two points below its score in reading. A good 
metric to use here is normal curve equivalent (nCE) scores. Their 
equal-interval quality is needed to look at score differences. 
Additionally^ they will have che same meaning for all subtests in a 
battery. Other equ^I-interval metrics^ such as expanded scale scores, 
are not appropriate as they do not have the same meaning for all 
subtest^;. A standard should be set to determine meaningful differences 
so that schools have some guidelines as to when special notion will be 
needed. The guideline may be determined using traditional tests of 
statistical significance. However, the problem discussed earlier of 
small differences beia^ statistically significant in large groups can 
apply here also. Given this situation, it may make more sense to specify 
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some amount of difference that appears to make irLULtive sense. The 
standard can be modified if 'f^t: seems to over- or underidentify problem 
areas. 

Group results for a CRT are generally a report of the percentage of 
students passing each objective. A comparison of these percentages 
passing on all objectives can be made just like the comparison of NCE 
o^ans described above for NRTs. However^ to compare the percentages 
passing each objective assumes that the objectives are both of equal 
difficulty and are covered equally well by the curriculum. If this is 
not the case ^ it vill be necessary to determine if the differences in 
difficulty are caused by an underlying skill hierarchy^ by sloppy test 
construction^ or by weaknesses in the instructional program. 

If results on different CRTs (e.g.^ reading and math) are being 
compared^ the caution presented earlier must be dealt with. That is ^ 
there may be different standards on the two tests. To determine if this 
is the case^ you might choose a NRT that measures both subjects and see 
if the results on that test are similar to those on the CRTs. If not^ 
the reason could be different standards. The recommendation that the 
comparison test be a NRT is miade because those results are not dependent 
on standard setting. 

Once the statistical operations described above have identified 
areas of weakness or strength^ the description of test content discussed 
in the previous chapter can be used to help a school or district take 
action. A list of the specific objectives included on the subtest can be 
very helpful in isolating the skills that need to be improved or those 
that are being taught very well. 
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DETERMINING IF A SCHOOL DID AS WELL AS IT SHOULD HAVE 

Many people are aware of at least some of the reasons students or 
groups of students do not all perform the sam^ on achievement tests. 
Thus, when scores for a given school are not at the top of the 
distribution, the natural question is often, did the school do as well as 
it should have done? Vfe have no easy^ incontrovertible; way to respond to 
that question* Some people argue it is simply a matter of administering 
an abilities test, an achievement test, and then comparing the results of 
the two tests to answer the question* However^ this premise that there 
are group-administered tests which measure something called "ability** 
which can be distinguished from "achievement*' has been severely 
challenged * Standardized , group-administered ability tests usually 
assess skills in readings computing, and other areas that are learned and 
which strongly resemble the skills assessed on achievement tests. Thus^ 
using the performance on one as a standard against which to measure the 
other is highly questionable* 

Given these real limitations in our measuring instruments, this 
question cannot be answered absolutely* As an alti^rnative , the question 
wliich might be asked is whether a school is making appropriate progress* 
To address this issue ^ one can use past achievement test scores to 
predict future ones as described in the previous discussion of 
longitudinal analysis* For NRTs^ percentile ranks (or NCEs f^'^ich are 
directly related to percentile ranks) woula be a good metric to use for 
this purpose* Prediction of performance is based on the following 
assumption: if a group averages at the 85th percentile in Grade 3, it is 
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expected that the lame group would perform at about the sam^ percentile 
level in succeeding grades, if normal progress were made. Deviations in 
either direction indicate that something unusual *^ occurring. Again, 
establishing a guideline for the point when a deviation becomes *arge 
enough to be important must be based on professional judgment and 
practical experience. 

It should be pointed out that for an analysis such as the one 
'^^scribed above, choice of a metric is critically important. For 
example, grade equivalent scores would not be appropriate because it 
would be extremely difficult to define normal 'growth from them. Students 
at the 50th percentile might be expected to improve by 1 year for each 
year in school. However, a student at the 80th percentile may improve, 
depending on the grade, anywhere from K5 to 3 or more years in a school 
year. Those at the 20th percentile may be doing well to improve .6 of a 
year in that time. 

On CRTs, percentile rank may be replaced by the number of objectives 
passed at two points in time. Success at the second data point would be 
determined by whether the school had achieved more or less objectives 
than did the typical school which started with the same number achieved. 
Determining hot many more or less objectives passed constitute a warning 
signal is a decision which must, again, be left to professional judgment. 
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Wot too long ago, test results were considered the private domain of 
teachers, counselors, aim other school staff members. In the past ten 
years, with the strong educational accountability laovement, this is no 
longer the case. The present task for the school district test director 
is to coavnunicate test results, not to make sure they remain 
confidential. In this paper, we have tried to provide guidelines for how 
this might be accomplished. We have discussed the contents of test 
reports, and how the approach to reporting might be modified to meet the 
needs of different audiences. 

REPORT CONTENTS 

have divided our discussion of report contents into three major 
areas: Descriptive Information , Test Results, and Interpretive Cautions. 
The descriptive information includes a description of the test program 
such as names of test batteries, grades tested, and dates of 
administration. A discussion of the skills measured by each subtest is 
also importaat. Finally, this section should provide aa explanation of 
the types of scores that are used in tb ! report. 

We recommend that the reporting of test results include a measure of 
the average performance of the groups of interest-- district , school, 
special programs, etc. In addition, some indication of the dispersion of 
scores in the group should be provided. One way of doing this is by 
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shoving the percentage o£ students that scored in each national quarter 
o£ the national norm group. Historical data should also be included to 
provide a picture ot whether the achievement level in a school or 
district is improving or declining. Additional data that can be helpful 
to a district in planning instructional programs are results by 
racial/ethnic group, by sex, or by groups of students with similar 
socioeconomic status. 

We also recommend that test reports clearly explain the limitations 
of test scores. Without such an explanation (and, unfortunately^ 
sometimes even with it) people will almoot assuredly misuse the results. 
Areas of special concern include the interpretation of grade equivalent 
scores; the comparison of performance across tests, grades, schools, or 
groups of students; and the interpretation of small changes in 
performance * 

Many of the elements that we have suggested be included in reports 
of test data are also n^ntioned in the Standards for Educational and 
Psychological Tests , published by the American Psychological Association 
(APA^ 1974). One of the areas mentioned in that document is that the 
influence of race, sex, and socioeconomic status on test performance 
should be pointed out. Th« Standards also call for warning against 
common misuses of test scores and for providing sufficient information 
for correct interpretation. 

AUDIEKCES 

Three audiences were discussed here: the board of education (and the 
general public), parents^ and school-based staff. While the information 
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net^ds of thej»e ^uiliences were judged to be si<nHar in ^i^ny iiapcci:3, it 

recontmended that reports he soicAwhat differentiated in t^tnjt; of 
comprehensiveness and formats 

Reports to boards of education (and the public) ate generally the 

4 

most; formal and coaiplete , including re possible) oaost of the 
information reviewed above. Reports to parents are generally much 
briefer and deal with ^ child's perforotanc^ , noL with group data. These 
reports might contain a couple of sentences dpscribing the program and a 
nontechnical explanation of the results or of how to interpret the data 
that are presented. More critical is a clear discussion of test error^ 
since parents often feel thdt a I or 2 percentile rank change is a 
meaningful trend* Reporting scores with error bands can help in getting 
across the idea of tect error. 

School-based staff members can probably use the most detailed 
reports on test data for their own school. This report need not, 
however, be as forioal as the annual report presented to the school board. 
Often these reports are in the form of printouts with little accompanying 
text* This is because the staffs generally veceivo the same kind of 
reports each year and <nay not require much explanation after an initial 
workshop. These reports can include detailed frequency distributions; 
results for students grouped by clas^j , score ^ or special program^ and 
school historical trends. 

A FINAL UORB 

Looking over these chapters and the materials received from school 
districts, we feel compelled to ask, "How did something as simple as 
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reporting test scores get so complicated?** 

Ue use printouts^ bar graphs^ pie graphs^ tables, exhibits^ 
brochures^ overlays^ and slide/tapes, ffe have formal reports^ summary 
reports^ conferences^ and vorkshops. 

could probably have doubled the length of this discussion had we 
singled out the **press conference" for additional attention. 
Undoubtedly, a sumaary of the approaches used In this area would comprise 
a valuable^ and amusing^ volume of its own. 

While the "art*' of reporting test scores certainly caa be improved^ 
we know of no way to drastically streamline the task of reporting. There 
currently exists no all-purpose approach which can be adopted for all 
audiences and all districts, nor do we feel that one is likely to emerge 
in the near future. The needs of each group must be kept in mind and the 
format and content of each report modified accordingly. The critical 
thing is to keep in mind the question(s) that each audience needs to have 
answered and to provide the information which will allow accurate 
interpretation of the answers provided. 

If there is one point that we cannot make strongly enough, it is 
that reports to each and every audience must be structured to answer 
questions, not juct provide nuoibers. This means that the knowledge base, 
concerns, and experience of an audience need to be considered very 
carefully. For some, this means printouts; for others, a simple letter. 
There is no longer any question about whether test scores should be 
released. 'Hie "how to do so" remains that which each of us mn^t solve. 
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APPENDIX A 



District 



Albu<{uer<{ue 



Austin 



Charleston 
County 



Dade Courly 



Dallas 



Detroit 

District of 
Columbia 

Fort Worth 

Houston 

Los Angeles 
Memphis 



Montgomery 
County 

New .Orleans 



Palm &each 
County 



TEST RESULT REPORTING KATB&IALS REVIEWED 
Report 

Comprehensive Tests of Basic Skills (CTBS) 
Testings Spring 1982> District Report 

Individual Student Reports 
Student Achievement » 1982-^83 

Report to Parents 

Results from the Springs 1982^ 

Worm-Referenced Testing Program 

District and School Profiles^ 1982-83 
Parent Report 

An Interpretive Analysis of System-^Wide 

Achieven^nt Data^ 1981-82 

Do You Know About Testing? - Topics for 

Parents 

School Achievement Indices 
Student Test Report 

Sunntary of Achievement Test Scores » 1982 

A Sunntary of Student Achievement on the 
Comprehensive Tests of Basic Skills 

Press release 

Elementary School Profiles^ 1981-82 
Secondary School Profiles, 1981*82 

Report on the District Testing Programs^ 
1981-82 

Norm-referenced Te3t Results^ 1981-82 

California Achievement: Tests » A Practical 
Guide for Using and Interpreting the Results 

Annual Test Report^ 1979-80 
Annual Test Report^ 1981-82 

Testing Programs 1981-82 > Suinmary of Results 
and Interpretive Guide 

Sample School Report 
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Pittsburgh Prelitninary Report on Student Achievement in 

the Pittsburgh Public Schools, School Year 

1981- 82 

Report to Parents 

Portland, OR General Orientation Manual for the Portland 

Public School Achievement Testing Program 
Portland Public Schools Achievement Levels 
Tests, Sample Reports 

Rochester Elementary School Profiles for Academic Year 

1982- 83 

San Diego California Assessment Program Statewide 

Testing Results by District and by School, 
1981-82 School Year 

Districtwide Testing Results by District and 
by School, Grade 11 » Fall 1982 
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COMHONLT USED TEST TERMS 

This appendix provides information about commonly used test terms* 

Each term is defined* The definition is followed by a statement on 

its uses and a list of precautions to be observed when using the type 

of test or score being discussed* The terms are listed in 
alphabetical order* 

CRITERIOH-REFEREHCED TEST (CRT) 

Definition 

A test based on specific learning objectives (or teaching objectives)^ 
usually within a narrow range of subject matter or skill* The tests 
are designed to measure the knowledge or skills the student has 
attained* The Maryland Functional Reading Test (MFRT) is an example 
of a CRT* 

Use 

CRTs provide information about the extent to which the student has 
attained the learning objective(s)* 

Precautions(s) 

1* CRTs are often designed so a student can answer all or 

almost all of the questions correctly or incorrectly depending 
on the extent to which the student has attained the skills being 
measured* They are not designed to yield information about 
different levels of achievement and^ therefore, cannot usually 
be used to rank students on specific skills* 

2* To be useful measures of specific skills^ CRTs must have a 

sufficient number of questions measuring each particular skill 
included on the test* Although what is "suf f icient" ' is not a 
fixed number^ there should^ in most cases ^ be at least five 
questions which measure a skill* A test purporting to be a CRT 
which has fewer than five questions per skill should be viewed 
with skepticism* 

GRADE EQUIVAUENT SCORES (GE) 

Def in ition 

The grade equivalent of ^ given raw score on any test estimates the 
grade level at which the typical pupil achieves this raw score* The 
digit(s) to the left of the decimal point represent the grade; the 
digit to the right of the decimal point represents the month within 
the grade according to the following table: 
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Number 



Month 



0 
1 
2 
3 
4 
5 
6 
7 
8 
9 



September 

October 

November 

December 

January 

February 

March 



April 
May 

June-August 



An example of how a teat publisher might derive grade equivalents can 
be useful in understanding GE. The example presented belov represents 
the best methodology currently in use. Many tests are normed with 
fever samples. 

If the publisher is norming a fourth grade test ^ he will test a 
representative sample^in Grades 3* 4* and 5. In each grade ^ the 
sample^ or two comparable samples^ will be tested in the fall 
(November) and the spring (April). Thus, the grade levels being 
tested as 3.2| 3.7, 4.2, 4.7, 5.2, and 5.7. (Often publishers test 
only once a year.) 

The average raw test score for the students in each group is computed 
and plotted on a graph similar to the one below. The mean scores are 
indicated by points on the graph. All other grade-and-month values 
are estimated by interpolation between the means ^nd extrapolation 
beyond the means. The G£s beyond the grade range of students in the 
norming sample should be regarded as no better than rough estimates. 



0 I — ( — \ — I — I — \ — \ — 1 — \ — \ — i — \ — 1 — ^ 

17 2.2 2 J 3.2 3J 4.2 4.7 5.2 57 6.2 6.7 7.2 7.7 8 




GRADE EQUIVALENT 



Figure B1 
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Use 



GEs provide a familiar referent for test scores. 
Precautions 

1« The grade equivalent score does not indicate the grade level of 
work that a student can perform. It simply estimates the grade 
level of the typical student in the normin^ sample achieving a 
given raw score. For example » suppose a fourth grade student 
has a score with a grade equivalent of 5.4 on a fourth grade 
test. This does not mean that a fourth grade student can do 
work which is done in January in the fifth grade. It simply 
estiamtes that this student did as well on a fourth grade test 
as the typical student in January of the fifth grade. However, 
remember that if the norming sample for the fourth grade test 
did not include any fifth grade students, this estimate is very 
tentative. 

2* Grade equivalent scores should not be added and subtracted » 
because they are not an equal distance apart at all points. 
They are developed under an assumption that learning occurs 
equally during the school year. In fact» students tend to learn 
more at different times in the year. From a strict statisticsl 
point of view» this lack of equsl score intervals means that 
mean G& scores should not be computed. However, if the G& 
scores are converted to Normal Curve Equivalent scores which do 
have this equal interval quality^ the mean score computed from 
the converted scores is generally very close to that computed 
from the GCs, especially if the grade equivalents represent a 
wide rang^ of possible scores. 

3« The attempt to build a scale based on the assumption of equal 
learning cited in Number 2 above results in differential GZ 
gains for raw score changes. What occurs is that a one raw 
score point change may cause a one^^nth change in QB at one 
place in the norm table and a five-month gsin elsewhere. The 
largest changes in GB generally happen in the extremes of score 
distribution. 
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An extople of the unequal GE differences between raw scores is 
shown below. These scores are taken from the Iowa Tests of 
Basic Skills (ITES) seventh grade spelling test. 



Grade Test 


Raw Score 


Grade Equivalent 


Difference in Grade 








Eauivalent 


7 Spelling 


7 


3.5 






8 


4.0 


.5 




9 


4.4 


.4 


7 Spelling 


25 


8.4 






25 


8.5 


.1 




27 


8./ 


•2 



4. Grade equivalents generally have a wider range at higher grade 
levels. This leads to the situation in which a student who has 
the same PR in Grades 3 and 5 will probably be farther above (or 
below) the median in GH terms in Grade 5. This means that if 
he/she has a high PR in both grades, the gain in GE terms will 
be more than two years. If he/she has a low PR^ the gain will 
be less than two GEs. Therefore, if a constant expected G£ gain 
vere established for all students, it would be too high for some 
and too low for others. The example below from IT6S norms 
demons t ra t e s th is pro b lem . 



PR 


Grade 3 


Grade 5 


Grade Equivalent Change 


90 


5.1 


7.5 


2.4 


50 


3.6 


5.6 


2.0 


10 


2.5 


4.1 


1.5 



5. Because a grade equivalent score represents the performance of a 
typical student at a given grade level, approximately half of 
the students in a nationwide sample would be expected to score 
below grade level. 

6. Grade equivalents should not be compared across subject areas ^ 
because they have different meaning. For example, mathematics 
is more grade^related than reading; therefore, the GEs are 
generally less spread out for math than for reading. 

7. Grade equivalents should not be compared across dilrferent tests 
l^cause they may have different means because of different 
noi.^*ng samples. 

INTEKQUARTIUE RANG * 

Definition 

Quartiles are scores (points in a distribution) that divide a score 

distribution into quarters. Twenty^five percent of the scores ara at 
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or below the first quartile (Ql)> 50 percent are at or belov the 
second quartile (Q2> which is also the mediaa)> aad 75 percent are at 
or below the third quartile (Q3). The interquartile raage includes 
the baad of scores that lies between QI aad Q3> or the middle 50 
percent of the scores. 

Use 

By elimiaatiag the effect of the lowest aad highest quarters of the 
distributioa> the interquartile raage provides a toeasure of how the 
typical studeats ia a group performed. 

Precautioa(s) 

Slimiaatiag the extreme scores my be removiag importaat iaformatioa 
such as the locatioa of pockets of studeats aeediag compeasatory or 
gifted programs. If the mediaa is close to either quartile > it could 
iadicate a larg^ aumber of studeats at that ead of the distributioa 
who might require such services. 

MEAN 

Def iaitioa 

The sum of the scores divided by the aumber of scores. 
Use 

The meaa is used as measure of the performaace of the "typical" 
studeat ia a group « 

Precautions 

1. Ia a small groups the meaa caa be overly iaflueaced by a few 
extreme scores. Thus, if a few scores in a distributioa are 
very low but most are quite high^ the meaa will be depresse«l by 
the low scores more thaa the mediaa. Ia groups where there are 
a few extremely low scores> the toean will^ therefore^ be lower 
thaa the mediaa. Therefore, it is oftea useful to compare the 
meaa with the median* 

2. Use of the mean provides ao iaformatioa about the spread of 
scores. 

MEDIAN 

Def iaitioa 

The score that divides a test score distributioa ia half is kaowa as 
the mediaa. Half of the scores are above the mediaa, half are below. 
The mediaa is the score that has a perceatile raak of 50. 
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Use 



Th« median is used as a measure of the performance of the "typical'* 
student in a group* 

Precaution(fl) 

1. See Precaution I for "mean." 

2* Use of the median provides no information about the spread of 
scores* 

NORMAL CORVE EQUXVALBNT SCORES (HCE) 
Definition 

NCEs divide the normal distribution into 99 segments, units, or scores 
(Figure B2)* Scores range from 1-99, with a mean/median of 50* NCEs 
can be related to percentile ranks as shown in the comparative scales 
in Figure B2* 
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Figure B2 
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1. NC£d can be subjected to arithmetic operations. Therefore, mean 
NCEs can be computed, and dif ferences^in NCEs can be compared at 
all points in the score distribution. 

2. HCZs can be used in analyses of group data (for reasons above). 
In addition, NCEs are scaled to reveal sm^iU changes, something 
which stanine scores vill not do consistently because of the 
large score r£tnge a^ each stanine point. 

Piecaution(s) 

1. Use of NCEs for evaluating individualized performance is to be 
done with caution. A change of five HCZ units on a tect score 
is within the error range for individuals on most standardized 
tests. However, since NCEs give a false sense of precision—and 
hence of security— the careless test user could consider such a 
change meaningful. 

2. NCEs are difficult to interpret when presented alone. After an 
analysis has been performed on the basis of NCEs, results are 
often converted to some more readily understandable scale like 
percenti le ranks . 

HQRH-RJEFEft£HCED TEST <HRT) 

Definition 

The HRT is designed to rank students according to the number of test 
items answered correctly (i.e., according to raw score). Ranking is 
usually also done in relation to the performance of a norming sample. 
The California Achievement Tests is an example of an NRT. 

Use 

Norm-referenced tests identify those students who know the most about 
the content included on the test. 

Precaution(s) 

1. A good NRT is designed to enable between 40 and 70 percent of 
the examinees to answer any given item correctly. Many items 
are therefore too difficult for a majority of examinees to get 
right. This means that most NRTs are not very good tests of 
what an individual student knows (as opposed to 



In a strict statistical sense, it is probably incorrect to 
subject any test scores to arithmetic operations. However, tfC£s , 
standard scores with an underlying normal distribution, raw scores, 
and stanines come closer than any other score scales to having 
equal -interval properties uhich permit arithmetic operations. 
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criterion-referenced tests). Rather, they are measures of who 
knows the most about the test content. 

2. NRTs o^ten include only one or two questions which measure 

achievetnent of a given skill cr objective. Information about 
student performance on a particular objective is, therefore^ 
usually not very reliable. 

PERCENTILE RANK (PR) 

Definition 

The percentile rank (PR) expresses the percentage of students in the 
norming sample who scored at or below a given score. For example, if 
a raw score of 30 has a percentile rank of 78, then 78 percent of the 
students in the norming sample scored at or below 30 items correct. 

Otie 

PRs provide easily interpre table information about how a given 
student's performance on a test compares with the performance of 
students in the norming sample. 

Precaution(s) 

1. PRs should not be added i\or subtracted because they are not an 
equal distance apart at all pointc. For example, Figure 3.2 
clearly shows that an increase of 10 points between percentile 
ranks 45 and 55 in not the same distance as an increase of 10 
points between percentile ranks 85 and 95. A person would have 
to show a larger amount of improvement to achieve the second 
increase . 

2. On a test of fewer th&n I'^O questions, it is not possible for 
every whole number of the percentile rank scale to have an 
associated raw score. Therefore, in such circumstances, a 
one-^point increase in raw score can cause an increase of several 
percentile rank units. What might appear to be substantial 
increase on the percentile rauk scale is really only an increase 
of one additional Question correct. This caveat applies to 
virtually all test^i in standardized batteries. 

3. Percentile ranks should not be confused with percent of correct 
answers (raw scores). They have completely different meanings. 

RAW SCORE 

Def ini tion 

Raw score represents the number of q '^lons or test items answered 
correctl/. 
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Use 

Raw scores can be used to report the number of questions answered 
correctly. 

?recaution(s) 

1, A raw score has no meaning other than the number of items 
answered correctly. It provides no interpretative information. 

2, Raw scores can be quite misleading t'hen reported by themselves 
because the meaning of raw scores differs from test to test. 
For example^ If one 50-item test is ea5y and one 50-item test is 
difficulty a raw score of 30 on the difficult test might 
represent better performance than a raw score of 45 on the 
easier test. 

3, Subjecting raw scores to arithmetic operations (e.g. addition^ 
etc.) is a questionable procedure. Generally^ raw scores do not 
have the equal inteval property required for these operations. 
Ihis is because the same raw score can be obtained by different 
students who get different combinations of items correct. These 
items will most likely vary in their level of difficulty. Thus, 
identical raw scores will possibly represent differential ^cvels 
of achievement. 

STANDARD DBVIATIOH (SD) 

Definition 

Standard Deviation (sD) is a measure of the dispersion in a set of 
scores. The closer the scores cluster around the mean^ the snialler 
rhe SD will be. 

Use 

As a measure of the spread in a set of scores, the SD can be used to 
assist in determining the degree of importance of score differences. 
For example^ a difference of 2 points would probably not have much 
meaning if the SD were 20 but could be quite important if the SD were 

0. 5. 

STAHIHE 

Definition 

A stanine is one of che scores of nine-point division of the normal 
distribution. Stanine scores range from 1 to 9 with a mean and median 
of 5. As shown in Figure B2, each stanine has a range of 
corresponding percentile ranks or raw scores. 

Uses 

1. Stanines can be subjected to arithmetic operations (addition^ 
etc.). Therefore^ the mean of distributions can be computed^ 
and differences in stanine scores c^n be compared at all points 
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in the dLstributLon except » in some cases, at the extreme 
stanine scores of I and 9* 

2. Stanines do not give a false sense of accuracy of a given score 
because esch stanine covers a range of raw scores. The stanine 
scale is therefore useful for reporting individuals* scr -*es . 
Differences in stanines are more likely to represent ch&.;ge 
beyond that which can be attributed to error than are other 
kinds of scores. 

PrecautionCs) 

As csn be seen in Figure B2 , interpretation of differences in stanine 
scores is clouded by the range within a given stanine. For example. 
It: an individual's score increases ft^u the top of the Stanine-3 range 
to the bottom of the Stanine-5 range, it represents less improvement 
than an increase from the bottom of the Stanine'^3 range to the top of 
the Stanine-'A range. However, on cursory examination, it would seem 
as if the first increase were the greater. 
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Anastasi, Anne. Paychological Testing . Macmillan Publishing Co., New 
York, N.Y., 1982 

Cronbach , Lee J. Essentials of Psychological Testing . Harper & Ro\/, 
New York, N.Y. , 1970 

Ebel, Robert L. Essentials of Educational Measurement . 
Prentice-Hall, Englevood Cliffs, N*J*, 1979 

Hopkins, Kenneth D., and Stanley, Julian C. Educational and 
Psychological Measurement and Evaluation . Prentice'Hall , Englevood 
Cliffs, N.J., 1981 

Mehrens , William A.* and Lehmann^ Irvin J. Measurement and Evaluation 
in Education and Psychology . Holt, Rinehart and Winston, Inc#, New 
York, N.Y. , 1973 

TJiorndike, Robert L. , and Hagen, Elizabeth P# Ma asureinen t and 
Evaltiation in Psychology and Education * John Wiley & Sons, New York, 
N.Y., 1977 
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APPENDIX D 
REPORTS OF TEST RESULTS 
cited io 

"RESEARCS AND EVALUATION STUDIES 
FROM LARGE SCHOOL DISTRICTS 1982"* 



ALBUQUERQUE PUBLIC SCHOOLS, NEW MEXICO 

New Mexico High School Proficiency Examination. Spring, 1980 
Teat Results . Albuquerque Public Schools, 1980. (ERIC Document 
Reproduction Service No. ED 211 563). 

ATLANTA INDEPEHDERT SC^L DISTRICT^ GEORGIA 

McCarson, Carole. Reaching Achievectent. Report No. 14-8 . Atlanta 
Public Schools » June 1980. (ERIC Document Reproduction Service No. 
ED 210 665). 

McCarson^ Carole. Results of the Admissions Testing Program for the 
Atlanta Public Schools^ Seniors from 1975 to 1981 . Atlanta Public 
Schools » February 1982^ (ERIC Document Reproduction Service Ho. 
ED 217 068). 

McCarson ^ Carole. Results of the Georgia Statewide Testing Program 
for the Atlanta Public Schools^ 1981 . Atlanta Public Schools^ 
Division of Research » Evaluation^ and Data Processing, 1981. (ERIC 
Document Reproduction Service Ko. ED 217 067). 

AUSTIN IHDEPBNDEHT SCHOOL DISTRICT^ TEXAS 

Austin Independent School District Achievcctent Profiles, 1980-81. 
Volume I: Elementary Schools (Iowa Test of Basic Skills)T 
Allan-Linder and District Publication No. 80.83 . Austin Independent 
School District, Office ot Research Evaluation, June 30, 1981. 
(ERIC Docun^cut Reproduction Service No. ED 209 290). 

Austin Independent School District Achievement Profiles, 1980-81. 
Volume II: Elementary Schools (Iowa Tests of Basic Skills)" 
Maplewood-Zilker . Austin Independent School District, Office of 
Research and Evaluation, 1981. (ERIC Document Reproduction Service 
»o. ED 209 291). 



* This bibliography, and earlier annual editions (1980,1981), are 
available from the ERIC Clearinghouse on Tests, Measurement, and 
Evaluation, Educational Testing Service, Princeton, HJ 08541-0001, 
for $6.00 each. 



^68- 

74 



Austin Independent School District Achievement Profilegj 1980*81. 
Volume III: Junior High Schools Clowa Tests of Basic Skills) and 
Senior High Schools CSequential Tests of Educational Progress J. 
Austin Independent School District^ Office of Research and Evaluation^ 
June 30» 1981. (ERIO Document Reproduction Service Ho. ED 209 292). 

DETROIT PUBLIC SCHOOLS, HICHI6AH 

Summary of Achievement Test Scores"1980. School^bySchool Test 
Results . Detroit Public Schools, Department of Research and 
Evaluation^ 1980. (ERIC Document Reproduction Service No. 
ED 20 8 051). 

PHIL&DELFHU CIT7 SCflOOL DISTRICT, PEHKSYLVANIA 

Grosswald, Julea. City^tfide SumBiaries^ City^tfide and District 
Performance Distributions, Kindergarten through Grade TVelve. 1978*79 
Philadelphia City-tfide Testing Prosraot February 1979 Achievement 
Testing frojjratn. Report Cfo. 8004 . Philadelphia School District, 
Office of Research and Evaluation, September 1979. (ERIC Document 
Reproduction Service No. ED 208 052). 

SAH DIEGO UNIFIED SCHOOL DXSTRXCt, CALIFOESIA 

Statewide and Diatricttfide Teating Results by District and by School, 
San Diego City Schools. December 1979 to October 1980 ^ San Diego 
City Schools^ Not^ember 1980. CERIO Document Reproduction Service Mo* 
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Ten years ago, the practice of releasing test scores to the 
public was not generally accepted. The issue today is not whether or 
not to release test scores, but rather what to release and how to 
release it. Further, it has been increasingly acknowledged that 
since the audience for test scores has different faces with different 
backgrounds or interests, the content and format of reporting may 
also need to be varied. 

The purpose of this report is to address issues in the release 
of test scores to a variety of audiences: parents, school board 
members, school staff, the news media, and the general public. It 
discusses the kinds of information that such reports might include 
and suggests some strategies for presenting them. 
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